Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Oct 14, 2025

Will close #3195

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I'm expecting most of the diff to be from this commit:

The fixing per-backend should only be a couple lines or handling it in nw.Expr

Using 4/16 constructors is enough to demonstrate the bug
@dangotbanned dangotbanned changed the title fix(DRAFT): Always accept `Expr.is_in(other: Iterable) fix(DRAFT): Always accept Expr.is_in(other: Iterable) Oct 14, 2025
@dangotbanned dangotbanned added fix duckdb Issue is related to duckdb backend polars Issue is related to polars backend sqlframe Issue is related to sqlframe backend labels Oct 14, 2025
@dangotbanned dangotbanned changed the title fix(DRAFT): Always accept Expr.is_in(other: Iterable) fix(DRAFT): Always accept {Expr,Series}.is_in(other: Iterable) Oct 14, 2025
Comment on lines +2127 to +2132
class _CanTo_List(Protocol): # noqa: N801
def to_list(self, *args: Any, **kwds: Any) -> list[Any]: ...


class _CanToList(Protocol):
def tolist(self, *args: Any, **kwds: Any) -> list[Any]: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I hate this as well πŸ˜‚

Copy link
Member Author

@dangotbanned dangotbanned Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better idea would be to ...

Rename _CanTo_List -> ToList, and move to _translate.py alongside:

class ToDict(Protocol[ToDictDT_co]):
def to_dict(self, *args: Any, **kwds: Any) -> ToDictDT_co: ...

Move these as well, but rename to reflect they their naming originates from numpy and pyarrow (respectively):

narwhals/narwhals/_utils.py

Lines 2131 to 2132 in 0c66432

class _CanToList(Protocol):
def tolist(self, *args: Any, **kwds: Any) -> list[Any]: ...

narwhals/narwhals/_utils.py

Lines 2135 to 2136 in 0c66432

class _CanTo_PyList(Protocol): # noqa: N801
def to_pylist(self, *args: Any, **kwds: Any) -> list[Any]: ...

The names of the guards can still stay the same, since their implementations will (after updating the protocol names) the link between origin, protocol, method name:

narwhals/narwhals/_utils.py

Lines 2139 to 2144 in 0c66432

def can_to_list(obj: Any) -> TypeIs[_CanTo_List]:
return (
is_narwhals_series(obj)
or is_polars_series(obj)
or _hasattr_static(obj, "to_list")
)

narwhals/narwhals/_utils.py

Lines 2147 to 2148 in 0c66432

def can_tolist(obj: Any) -> TypeIs[_CanToList]:
return is_numpy_array_1d(obj) or _hasattr_static(obj, "tolist")

narwhals/narwhals/_utils.py

Lines 2151 to 2154 in 0c66432

def can_to_pylist(obj: Any) -> TypeIs[_CanTo_PyList]:
return (
(pa := get_pyarrow()) and isinstance(obj, (pa.Array, pa.ChunkedArray))
) or _hasattr_static(obj, "to_pylist")

Copy link
Member

@FBruzzesi FBruzzesi Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my comment (#3207 (comment)) - I would argue that native series are not ok, while numpy 1d arrays are.

My thought process for this is that if someone is doing something along the lines of:

import narwhals as nw
import polars as pl

def agnostic_func(frame: IntoDataFrameT) -> IntoDataFrameT:
    other = pl.Series([1, 2, 3])  # <- notice how this is a native series!!!
    return nw.from_native(frame).filter(nw.col("x").is_in(other)).to_native()

then the function is clearly not agnostic and polars would be required in this case.

A different case would be if a narwhals series with a different backend is provided. This could mean that the function is agnostic but a user is "mixin" backends:

def is_left_in_right(left_series: IntoSeriesT, right_series: IntoSeriesT) -> IntoSeriesT:
    left_nw = nw.from_native(left_series, series_only=True)
    right_nw = nw.from_native(right_series, series_only=True)
    return left_nw.is_in(right_nw).to_native()

# but now it a user to mix it up, not the library itself

is_left_in_right(pl.Series([1,2,3]), pd.Series([0, 1]))

This is the case I suggested to yell at the user with a warning.


From our side, I think it would greatly simplify (read as, get rid of) most of the protocols here, the type guards as well as iterable_to_sequence function.

Comment on lines +427 to +437
_into_iter: Callable[[int], Iterator[IntoIterable]] = _into_iter_selector()
"""`into_iter` fixtures use the suffix `_<n>` to denote the maximum number of constructors.
Anything greater than **10** may return less depending on available dependencies.
"""


@pytest.fixture(params=_into_iter(16), scope="session", ids=_ids_into_iter)
def into_iter_16(request: pytest.FixtureRequest) -> IntoIterable:
function: IntoIterable = request.param
return function
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda outdated now since (633a06d)

narwhals/expr.py Outdated
Comment on lines 993 to 1001
if isinstance(other, Iterable) and not isinstance(other, (str, bytes)):
other = other.to_native() if is_series(other) else iterable_to_sequence(other)
return self._with_elementwise(
lambda plx: self._to_compliant_expr(plx).is_in(
to_native(other, pass_through=True)
)
lambda plx: self._to_compliant_expr(plx).is_in(other)
Copy link
Member Author

@dangotbanned dangotbanned Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this but it is still wrong.
Didn't notice the issue until I started trying to add typing to the compliant-level

def is_in(self, other: Any) -> Self: ...

I don't understand why we were allowing any kind of Native* to be passed to every backend? πŸ€”

Gonna add another test for at least the more common case of a Series from the wrong backend

Updated

test: Add test_expr_is_in_series_wrong_backend

I'm not sure of a safe way to keep the same behavior (if it is desired).
Since we don't know the backend at this stage, the options I see are:

  1. Unconditionally convert to list | tuple
  2. Raise elsewhere when a Series is passed to a lazy backend
  3. Disallow Expr.is_in(other: Series)
  4. Do nothing

Copy link
Member Author

@dangotbanned dangotbanned Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli @FBruzzesi
do you guys have any preference on a path forward here?

This case is a bit different to some of the other places we'd reject nw.Series for lazy backends - since the length isn't an issue.
But the safer option of converting all nw.Series is gonna be less performant than the currently unsafe version (which only works on a matching eager implementation)

Everything seems like a tradeoff to me πŸ˜”

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved out of draft for visibility (#3207 (comment))

Copy link
Member

@FBruzzesi FBruzzesi Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dangotbanned I will try to take a look during the weekend, but I am not sure I can manage.

My thoughts from the context in the messages here (I didn't open the code changes just yet):

  • Realistically I don't think it's too common to pass one series from a different backend but we should not exclude such possibility
  • My opinion would be to:
    • Use the native series if isinstance(other, Series) and expr._implementation == other._implementation)
    • Otherwise convert it to a list (we can do that with native methods thankfully), but warn that such conversion is happening with a UserWarning. This case includes a series passed to a lazy backend

Update: Regarding native series, then I would prefer to raise an exception in such case

@dangotbanned dangotbanned changed the title fix(DRAFT): Always accept {Expr,Series}.is_in(other: Iterable) fix: Always accept {Expr,Series}.is_in(other: Iterable) Oct 14, 2025
@dangotbanned dangotbanned added help wanted Extra attention is needed and removed duckdb Issue is related to duckdb backend polars Issue is related to polars backend sqlframe Issue is related to sqlframe backend labels Oct 14, 2025
@dangotbanned dangotbanned marked this pull request as ready for review October 18, 2025 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expr.is_in(Iterable) raises inconsistently

3 participants